Achieving accurate and context-sensitive timing for code optimization
نویسندگان
چکیده
Key computational kernels must run near their peak efficiency for most high performance computing (HPC) applications. Getting this level of efficiency has always required extensive tuning of the kernel on a particular platform of interest. The success or failure of an optimization is usually measured by invoking a timer. Understanding how to build reliable and context-sensitive timers is one of the most neglected areas in HPC, and this results in a host of HPC software that looks good when reported in papers, but which delivers only a fraction of the reported performance when used by actual HPC applications. In this paper we motivate the importance of timer design, and then discuss the techniques and methodologies we have developed in order to accurately time HPC kernel routines for our well-known empirical tuning framework, ATLAS. This work was supported in part by National Science Foundation CRI grant SNS-0551504 [whaley,castaldo]@cs.utsa.edu UTSA/CS Technical Report CS-TR-2008-001 CONTENTS
منابع مشابه
Transducer composition for context-dependent network expansion
Context-dependent models for language units are essential in high-accuracy speech recognition. However, standard speech recognition frameworks are based on the substitution of lowerlevel models for higher-level units. Since substitution cannot express context-dependency constraints, actual recognizers use restrictive model-structure assumptions and specialized code for context-dependent models,...
متن کاملTiming and Code Size Optimization on Achieving Full Parallelism in Uniform Nested Loops
Multidimensional Retiming is one of the most important optimization techniques to improve timing parameters of nested loops. It consists in exploring the iterative and recursive structures of loops to redistribute computation nodes on cycle periods, and thus to achieve full parallelism. However, this technique introduces a large overhead in a loop generation due to the loop transformation. The ...
متن کاملنتایج یک کُد فرترن برای محاسبه دوز غدهای در ماموگرافی با استفاده از پارامترهای Sobol-Wu
Background: Accurate computation of the radiation dose to the breast is essential to mammography. Various the thicknesses of breast, the composition of the breast tissue and other variables affect the optimal breast dose. Furthermore, the glandular fraction, which refers to the composition of the breasts, as partitioned between radiation-sensitive glandular tissue and the adipose tissue, also h...
متن کاملOptimizing parallelism for nested loops with iterational and instructional retiming
Embedded systems have strict timing and code size requirements. Retiming is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. Traditionally, retiming has been applied at instruction level to reduce cycle period for single loops. While multi-dimensional (MD) retiming can explore the outer loop ...
متن کاملOptimizing Nested Loops with Iterational and Instructional Retiming
Embedded systems have strict timing and code size requirements. Retiming is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. Traditionally, retiming has been applied at instruction level to reduce cycle period for single loops. While multi-dimensional (MD) retiming can explore the outer loop ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Softw., Pract. Exper.
دوره 38 شماره
صفحات -
تاریخ انتشار 2008